Here we will map an example dataset from Roy et al (Cell Rep 2019) comprising 12,245 CD34+ cells from fetal and adult bone marrow. Note that any human hematopoietic cells, regardless of tissue source or disease status, can be mapped to this reference. This tutorial will take approximately 10 minutes to run.

Setup

library(Seurat)
library(tidyverse)
library(symphony)
library(ggpubr)
library(patchwork)

Install package from github

## install dependencies that are not on CRAN
if(!require(BiocManager, quietly = TRUE)) install.packages("BiocManager")
BiocManager::install(c("AUCell", "doMC"))
if(!require(devtools, quietly = TRUE)) install.packages("devtools")
devtools::install_github("jaredhuling/jcolors")
## install BoneMarrowMap package
devtools::install_github('andygxzeng/BoneMarrowMap', force = TRUE)
Downloading GitHub repo andygxzeng/BoneMarrowMap@HEAD

HEMATOPOIESIS - Download reference object and UMAP model

Set projection folder, Download reference object and UMAP model

# Set directory to store projection reference files
projection_path = './'

# Download Bone Marrow Reference - 344 Mb
curl::curl_download('https://bonemarrowmap.s3.us-east-2.amazonaws.com/BoneMarrow_RefMap_SymphonyRef.rds', 
                    destfile = paste0(projection_path, 'BoneMarrow_RefMap_SymphonyRef.rds'))
# Download uwot model file - 221 Mb
curl::curl_download('https://bonemarrowmap.s3.us-east-2.amazonaws.com/BoneMarrow_RefMap_uwot_model.uwot', 
                    destfile = paste0(projection_path, 'BoneMarrow_RefMap_uwot_model.uwot'))

# Load Symphony reference
ref <- readRDS(paste0(projection_path, 'BoneMarrow_RefMap_SymphonyRef.rds'))
# Set uwot path for UMAP projection
ref$save_uwot_path <- paste0(projection_path, 'BoneMarrow_RefMap_uwot_model.uwot')

Visualize Bone Marrow Reference

If we want to visualize celltype labels or metadata from the BM Reference, we can create a Seurat Object from the symphony reference This will be memory efficient as it will not include gene expression counts, only the UMAP coordinates and the metadata including cell labels and sorting information

BM_ref_obj <- create_ReferenceObject(BM_ref)

DimPlot(BM_ref_obj, reduction = 'umap', group.by = 'CellType_Annotation_formatted', raster=FALSE, label=TRUE, label.size = 4)

We can visualize other annotations too, including cell cycle phase and lineage pseudotime estimates.

p1 <- DimPlot(BM_ref_obj, reduction = 'umap', group.by = 'CyclePhase', raster=FALSE)
p2 <- FeaturePlot(BM_ref_obj, reduction = 'umap', features = 'Pseudotime', raster=FALSE) 

p1 + p2

Load Leukemia scRNA-seq data.

Let’s use these maps to project scRNA-seq data from two Ph+ B-ALL patients with distinct cell compositions (SJPHALL006_D, SJPHALL007_D)

Each patient sample was downsampled to 2500 cells to decrease runtime for this example.

query <- readRDS('ExampleQuery_BALL_scRNAseq.rds')
query
An object of class Seurat 
36601 features across 13265 samples within 1 assay 
Active assay: RNA (36601 features, 0 variable features)

Map the Query Data

Provide raw counts, metadata, and donor key. This should take <1 min Calculate mapping error and perform QC to remove low quality cells with high mapping error

# batch variable to correct in the query data, set as NULL if no batches in query
batchvar <- 'Sample'

# Map query dataset using Symphony (Kang et al 2021)
query <- map_Query(
    exp_query = query@assays$RNA@counts, 
    metadata_query = query@meta.data,
    ref_obj = BM_ref,
    vars = batchvar
)
Normalizing
Scaling and synchronizing query gene expression
Found 2360 reference variable genes in query dataset
Project query cells using reference gene loadings
Clustering query cells to reference centroids
Correcting query batch effects
UMAP
All done!
Warning: Adding features not currently present in the objectWarning: Invalid name supplied, making object name syntactically valid. New object name is Seurat..ProjectDim.RNA.harmony; see ?make.names for more details on syntax validity

In leukemia samples, the distribution of mapping error scores can vary broadly from sample to sample. In this context, we will want to threshold outliers with high mapping error on a per-sample basis. Typically, a threshold of 2, 2.5, or 3 MADs works well.

In some cases where sequencing depth is very low (e.g. older datasets from first-generation scRNA-seq protocols), a more stringent threshold of even 1.5 may be warranted to eliminate cells with low mapping quality

# Run QC based on mapping error score, flag cells with mapping error >= 2.5 MADs above median
query <- query %>% calculate_MappingError(., reference = BM_ref, MAD_threshold = 2.5, 
                                          threshold_by_donor = TRUE, donor_key = batchvar) # threshold mapping error on a per-sample basis.

# Plot distribution by patient to ensure you are catching the tail
query@meta.data %>% 
  ggplot(aes(x = mapping_error_score, fill = mapping_error_QC)) + 
  geom_histogram(bins = 200) + facet_wrap(.~get(batchvar))

# Get QC Plots
QC_plots <- plot_MappingErrorQC(query)

# Plot together - If this is too crowded, can also just call "QC_plots" aloneto display one by one
patchwork::wrap_plots(QC_plots, ncol = 4, widths = c(0.8, 0.3, 0.8, 0.3))

This important step identifies a subset of cells with high mapping error from the query dataset that are either:

Sometimes, low quality cells may erroneously map to the orthochromatic erythroblast region as this cell type has very low transcriptional diversity. These low quality query cells do not have hemoglobin expression and are in fact mis-mapped; they will be flagged by the QC filter and excluded from cell type assignments.

Please adjust the MAD_threshold (typically between 1 and 3) based on the distribution of your dataset to identify the outliers with low quality and high mapping error scores. This will improve your classifications and any downstream composition analysis

# # Optional step - remove outliers with high mapping error
# query <- subset(query, mapping_error_QC == 'Pass')

Optionally, outlier cells with high mapping error can also be removed at this stage. For ease of integrating these mapped annotations with the rest of your analysis, we can choose to skip this step. If so, Final CellType and Pseudotime predictions will be assigned as NA for cells failing the mapping error QC threshold.

Cell Type Assignments

We will next use a KNN classifier to assign cell identity based on the 30 K-Nearest Neighbours from the reference map. This label transfer step will take longer, potentially around 10 minutes for ~10,000 cells

# Predict Hematopoietic Cell Types by KNN classification
query <- predict_CellTypes(
  query_obj = query, 
  ref_obj = BM_ref, 
  initial_label = 'initial_CellType_BoneMarrowMap', # celltype assignments before filtering on mapping QC
  final_label = 'predicted_CellType_BoneMarrowMap'  # celltype assignments with map QC failing cells assigned as NA
) 

DimPlot(subset(query, mapping_error_QC == 'Pass'), reduction = 'umap', group.by = c('predicted_CellType_BoneMarrowMap'), 
        raster=FALSE, label=TRUE, label.size = 4)

Pseudotime Annotations

We can also annotate each query cell based on their position along hematopoietic pseudotime. Query cells will be assigned a pseudotime score based on the 30 K-Nearest Neighbours from the reference map. Since our Pseudotime KNN assignments are performed in UMAP space (more accurate than KNN on harmony components), this step is very fast (< 10s)

library(RColorBrewer)
# Predict Pseudotime values by KNN
query <- predict_Pseudotime(
  query_obj = query, 
  ref_obj = BM_ref, 
  initial_label = 'initial_Pseudotime',  # pseudotime assignments before filtering on mapping QC
  final_label = 'predicted_Pseudotime'   # pseudotime assignments with map QC failing cells assigned as NA
)

# Visualize Hematopoietic Pseudotime in query data
FeaturePlot(subset(query, mapping_error_QC == 'Pass'), features = c('predicted_Pseudotime'), split.by = 'Sample') & 
  scale_color_gradientn(colors = rev(brewer.pal(11, 'RdBu')))
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

Visualize Projection Density

Now let’s visualize the density distribution of query cells across the hematopoietic hierarchy

# Set batch/condition to be visualized individually
batch_key <- 'Sample'

# returns a list of plots for each donor from a pre-specified batch variable
projection_plots <- plot_Projection_byDonor(
  query_obj = query, 
  batch_key = batch_key, 
  ref_obj = BM_ref, 
  Hierarchy_only = FALSE, # Whether to exclude T/NK/Plasma/Stromal cells 
  downsample_reference = TRUE, 
  downsample_frac = 0.25,   # down-sample reference cells to 25%; reduces figure file size
  query_point_size = 0.2,   # adjust size of query cells based on # of cells
  saveplot = TRUE, 
  save_folder = 'projectionFigures/'
)

# show plots together with patchwork. Can also just call "projection_plots" object to display one-by-one
patchwork::wrap_plots(projection_plots, ncol = 2)

We can also set Hierarchy_only = TRUE to remove T/NK/Plasma/Stromal cells and focus solely on the hematopoietic hierarchy.

# Set batch/condition to be visualized individually
batch_key <- 'Sample'

# returns a list of plots for each donor from a pre-specified batch variable
projection_plots <- plot_Projection_byDonor(
  query_obj = query, 
  batch_key = batch_key, 
  ref_obj = BM_ref, 
  Hierarchy_only = TRUE, # Whether to exclude T/NK/Plasma/Stromal cells 
  downsample_reference = TRUE, 
  downsample_frac = 0.25,   # down-sample reference cells to 25%; reduces figure file size
  query_point_size = 0.2,   # adjust size of query cells based on # of cells
  saveplot = TRUE, 
  save_folder = 'projectionFigures/'
)

# show plots together with patchwork. Can also just call "projection_plots" object to display one-by-one
patchwork::wrap_plots(projection_plots, ncol = 2)

Let’s save the results of the mapping onto the complete hematopoietic hierarchy.

saveRDS(query, 'QueryData_Mapped_CompleteHematopoiesis.rds')

B CELL DEVELOPMENT - Download reference object and UMAP model

Set projection folder, Download reference object and UMAP model

# Set directory to store projection reference files
projection_path = './'

# # Download Bone Marrow Reference - 187 Mb
# curl::curl_download('https://bdevelopmentmap.s3.us-east-2.amazonaws.com/BDevelopment_RefMap_SymphonyRef.rds', 
#                     destfile = paste0(projection_path, 'BDevelopment_RefMap_SymphonyRef.rds'))
# # Download uwot model file - 99 Mb
# curl::curl_download('https://bdevelopmentmap.s3.us-east-2.amazonaws.com/BDevelopment_RefMap_uwot_model.uwot', 
#                     destfile = paste0(projection_path, 'BDevelopment_RefMap_uwot_model.uwot'))

# Load Symphony reference
bdev_ref <- readRDS(paste0(projection_path, 'BDevelopment_RefMap_SymphonyRef.rds'))
bdev_ref$umap$embedding[,1] = -bdev_ref$umap$embedding[,1] 

# Set uwot path for UMAP projection
bdev_ref$save_uwot_path <- paste0(projection_path, 'BDevelopment_RefMap_uwot_model.uwot')

Visualize Bone Marrow Reference

If we want to visualize celltype labels or metadata from the BM Reference, we can create a Seurat Object from the symphony reference This will be memory efficient as it will not include gene expression counts, only the UMAP coordinates and the metadata including cell labels and sorting information

bdev_ref_obj <- create_ReferenceObject(bdev_ref)

DimPlot(bdev_ref_obj, reduction = 'umap', group.by = 'BDevelopment_CellType', raster=FALSE, label=TRUE, label.size = 4)

Mapping query data onto B cell development

query <- readRDS('QueryData_Mapped_CompleteHematopoiesis.rds')
query
An object of class Seurat 
36601 features across 13265 samples within 1 assay 
Active assay: RNA (36601 features, 0 variable features)
 3 dimensional reductions calculated: pca, harmony, umap

Filter for celltypes along B cell development

B_development_celltypes <- c('HSC', 'HSC/MPP', 'MPP-MyLy', 'MPP-LMPP', 'LMPP', 'Early GMP', 'MLP', 'MLP-II', 'Pre-pDC', 'Pre-pDC Cycling', 'pDC', 
                    'CLP', 'EarlyProB', 'Pre-ProB', 'Pro-B VDJ', 'Pro-B Cycling', 'Large Pre-B', 'Small Pre-B', 'Immature B', 'Mature B')

query <- subset(query, predicted_CellType_BoneMarrowMap %in% B_development_celltypes)
query
An object of class Seurat 
36601 features across 11572 samples within 1 assay 
Active assay: RNA (36601 features, 0 variable features)
 3 dimensional reductions calculated: pca, harmony, umap

Map the Query Data

Provide raw counts, metadata, and donor key. This should take <1 min Calculate mapping error and perform QC to remove low quality cells with high mapping error


# Map query dataset using Symphony (Kang et al 2021)
query <- map_Query(
    exp_query = query@assays$RNA@counts, 
    metadata_query = query@meta.data,
    ref_obj = bdev_ref,
    vars = batchvar
)
Normalizing
Scaling and synchronizing query gene expression
Found 950 reference variable genes in query dataset
Project query cells using reference gene loadings
Clustering query cells to reference centroids
Correcting query batch effects
UMAP
All done!
Warning: Invalid name supplied, making object name syntactically valid. New object name is Seurat..ProjectDim.RNA.harmony; see ?make.names for more details on syntax validity
# Flip UMAP1 to go from left to right
query[['umap']]@cell.embeddings[,1] = -query[['umap']]@cell.embeddings[,1]
query[['umap']]
bdev_ref$meta_data

Cell Type Assignments

We will next use a KNN classifier to assign cell identity based on the 30 K-Nearest Neighbours from the reference map. This label transfer step will take longer, potentially around 10 minutes for ~10,000 cells

# Predict Hematopoietic Cell Types by KNN classification
query <- predict_CellTypes(
  query_obj = query, 
  ref_obj = bdev_ref, 
  ref_label = 'BDevelopment_CellType',   ## for a more detailed annotation, use BDevelopment_CellType_Comprehensive
  initial_label = 'initial_CellType_BDevelopment', # celltype assignments before filtering on mapping QC
  final_label = 'predicted_CellType_BDevelopment'  # celltype assignments with map QC failing cells assigned as NA
) 

DimPlot(subset(query, mapping_error_QC == 'Pass'), reduction = 'umap', group.by = c('predicted_CellType_BDevelopment'), 
        raster=FALSE, label=TRUE, label.size = 4)

Visualize Projection Density

Now let’s visualize the density distribution of query cells across the hematopoietic hierarchy

saveRDS(query, 'QueryData_Mapped_BcellDevelopment.rds')
---
title: "R Notebook"
output: html_notebook
---

Here we will map an example dataset from Roy et al (Cell Rep 2019) comprising 12,245 CD34+ cells from fetal and adult bone marrow.
Note that any human hematopoietic cells, regardless of tissue source or disease status, can be mapped to this reference.
This tutorial will take approximately 10 minutes to run.

### Setup

```{r}
library(Seurat)
library(tidyverse)
library(symphony)
library(ggpubr)
library(patchwork)
```

Install package from github

```{r}
## install dependencies that are not on CRAN
if(!require(BiocManager, quietly = TRUE)) install.packages("BiocManager")
BiocManager::install(c("AUCell", "doMC"))
if(!require(devtools, quietly = TRUE)) install.packages("devtools")
devtools::install_github("jaredhuling/jcolors")
```
 
```{r}
## install BoneMarrowMap package
devtools::install_github('andygxzeng/BoneMarrowMap', force = TRUE)
library(BoneMarrowMap)
```

 
#### HEMATOPOIESIS - Download reference object and UMAP model

Set projection folder, Download reference object and UMAP model

```{r}
# Set directory to store projection reference files
projection_path = './'

# Download Bone Marrow Reference - 344 Mb
curl::curl_download('https://bonemarrowmap.s3.us-east-2.amazonaws.com/BoneMarrow_RefMap_SymphonyRef.rds', 
                    destfile = paste0(projection_path, 'BoneMarrow_RefMap_SymphonyRef.rds'))
# Download uwot model file - 221 Mb
curl::curl_download('https://bonemarrowmap.s3.us-east-2.amazonaws.com/BoneMarrow_RefMap_uwot_model.uwot', 
                    destfile = paste0(projection_path, 'BoneMarrow_RefMap_uwot_model.uwot'))

# Load Symphony reference
BM_ref <- readRDS(paste0(projection_path, 'BoneMarrow_RefMap_SymphonyRef.rds'))
# Set uwot path for UMAP projection
BM_ref$save_uwot_path <- paste0(projection_path, 'BoneMarrow_RefMap_uwot_model.uwot')
```


#### Visualize Bone Marrow Reference

If we want to visualize celltype labels or metadata from the BM Reference, we can create a Seurat Object from the symphony reference 
This will be memory efficient as it will not include gene expression counts, only the UMAP coordinates and the metadata including cell labels and sorting information

```{r, fig.height=5, fig.width=11}
BM_ref_obj <- create_ReferenceObject(BM_ref)

DimPlot(BM_ref_obj, reduction = 'umap', group.by = 'CellType_Annotation_formatted', raster=FALSE, label=TRUE, label.size = 4)
```

We can visualize other annotations too, including cell cycle phase and lineage pseudotime estimates.

```{r, fig.height=3.5, fig.width=11}
p1 <- DimPlot(BM_ref_obj, reduction = 'umap', group.by = 'CyclePhase', raster=FALSE)
p2 <- FeaturePlot(BM_ref_obj, reduction = 'umap', features = 'Pseudotime', raster=FALSE) 

p1 + p2
```


#### Load Leukemia scRNA-seq data.

Let's use these maps to project scRNA-seq data from two Ph+ B-ALL patients with distinct cell compositions (SJPHALL006_D, SJPHALL007_D)

Each patient sample was downsampled to 2500 cells to decrease runtime for this example. 

```{r}
curl::curl_download('https://bdevelopmentmap.s3.us-east-2.amazonaws.com/ExampleQuery_BALL_scRNAseq.rds', 
                    destfile = paste0(projection_path, 'ExampleQuery_BALL_scRNAseq.rds'))

query <- readRDS('ExampleQuery_BALL_scRNAseq.rds')
query
```


### Map the Query Data
Provide raw counts, metadata, and donor key. This should take <1 min
Calculate mapping error and perform QC to remove low quality cells with high mapping error

```{r}
# batch variable to correct in the query data, set as NULL if no batches in query
batchvar <- 'Sample'

# Map query dataset using Symphony (Kang et al 2021)
query <- map_Query(
    exp_query = query@assays$RNA@counts, 
    metadata_query = query@meta.data,
    ref_obj = BM_ref,
    vars = batchvar
)
```

In leukemia samples, the distribution of mapping error scores can vary broadly from sample to sample. In this context, we will want to threshold outliers with high mapping error on a per-sample basis. Typically, a threshold of 2, 2.5, or 3 MADs works well. 

In some cases where sequencing depth is very low (e.g. older datasets from first-generation scRNA-seq protocols), a more stringent threshold of even 1.5 may be warranted to eliminate cells with low mapping quality 

```{r, fig.height=3, fig.width=10}
# Run QC based on mapping error score, flag cells with mapping error >= 2.5 MADs above median
query <- query %>% calculate_MappingError(., reference = BM_ref, MAD_threshold = 2.5, 
                                          threshold_by_donor = TRUE, donor_key = batchvar) # threshold mapping error on a per-sample basis.

# Plot distribution by patient to ensure you are catching the tail
query@meta.data %>% 
  ggplot(aes(x = mapping_error_score, fill = mapping_error_QC)) + 
  geom_histogram(bins = 200) + facet_wrap(.~get(batchvar))
```


```{r, fig.height=3, fig.width=10}
# Get QC Plots
QC_plots <- plot_MappingErrorQC(query)

# Plot together - If this is too crowded, can also just call "QC_plots" aloneto display one by one
patchwork::wrap_plots(QC_plots, ncol = 4, widths = c(0.8, 0.3, 0.8, 0.3))
```


This important step identifies a subset of cells with high mapping error from the query dataset that are either:

* not present within the reference, or
* have poor QC metrics (low RNA counts and low transcriptional diversity)

Sometimes, low quality cells may erroneously map to the orthochromatic erythroblast region as this cell type has very low transcriptional diversity. 
These low quality query cells do not have hemoglobin expression and are in fact mis-mapped; they will be flagged by the QC filter and excluded from cell type assignments.

**Please adjust the MAD_threshold (typically between 1 and 3) based on the distribution of your dataset to identify the outliers with low quality and high mapping error scores. This will improve your classifications and any downstream composition analysis**


```{r}
# # Optional step - remove outliers with high mapping error
# query <- subset(query, mapping_error_QC == 'Pass')
```

Optionally, outlier cells with high mapping error can also be removed at this stage.
For ease of integrating these mapped annotations with the rest of your analysis, we can choose to skip this step. If so, Final CellType and Pseudotime predictions will be assigned as NA for cells failing the mapping error QC threshold. 


### Cell Type Assignments
We will next use a KNN classifier to assign cell identity based on the 30 K-Nearest Neighbours from the reference map.
This label transfer step will take longer, potentially around 10 minutes for ~10,000 cells 

```{r, fig.height=5, fig.width=10}
# Predict Hematopoietic Cell Types by KNN classification
query <- predict_CellTypes(
  query_obj = query, 
  ref_obj = BM_ref, 
  initial_label = 'initial_CellType_BoneMarrowMap', # celltype assignments before filtering on mapping QC
  final_label = 'predicted_CellType_BoneMarrowMap'  # celltype assignments with map QC failing cells assigned as NA
) 

DimPlot(subset(query, mapping_error_QC == 'Pass'), reduction = 'umap', group.by = c('predicted_CellType_BoneMarrowMap'), 
        raster=FALSE, label=TRUE, label.size = 4)
```


#### Pseudotime Annotations
We can also annotate each query cell based on their position along hematopoietic pseudotime. 
Query cells will be assigned a pseudotime score based on the 30 K-Nearest Neighbours from the reference map.
Since our Pseudotime KNN assignments are performed in UMAP space (more accurate than KNN on harmony components), this step is very fast (< 10s)

```{r}
library(RColorBrewer)
```


```{r, fig.height=4, fig.width=12}
# Predict Pseudotime values by KNN
query <- predict_Pseudotime(
  query_obj = query, 
  ref_obj = BM_ref, 
  initial_label = 'initial_Pseudotime',  # pseudotime assignments before filtering on mapping QC
  final_label = 'predicted_Pseudotime'   # pseudotime assignments with map QC failing cells assigned as NA
)

# Visualize Hematopoietic Pseudotime in query data
FeaturePlot(subset(query, mapping_error_QC == 'Pass'), features = c('predicted_Pseudotime'), split.by = 'Sample') & 
  scale_color_gradientn(colors = rev(brewer.pal(11, 'RdBu')))
```

### Visualize Projection Density

Now let's visualize the density distribution of query cells across the hematopoietic hierarchy

```{r, fig.height=3, fig.width=8}
# Set batch/condition to be visualized individually
batch_key <- 'Sample'

# returns a list of plots for each donor from a pre-specified batch variable
projection_plots <- plot_Projection_byDonor(
  query_obj = query, 
  batch_key = batch_key, 
  ref_obj = BM_ref, 
  Hierarchy_only = FALSE, # Whether to exclude T/NK/Plasma/Stromal cells 
  downsample_reference = TRUE, 
  downsample_frac = 0.25,   # down-sample reference cells to 25%; reduces figure file size
  query_point_size = 0.2,   # adjust size of query cells based on # of cells
  saveplot = TRUE, 
  save_folder = 'projectionFigures/'
)

# show plots together with patchwork. Can also just call "projection_plots" object to display one-by-one
patchwork::wrap_plots(projection_plots, ncol = 2)
```

We can also set Hierarchy_only = TRUE to remove T/NK/Plasma/Stromal cells and focus solely on the hematopoietic hierarchy.

```{r, fig.height=3, fig.width=6.5}
# Set batch/condition to be visualized individually
batch_key <- 'Sample'

# returns a list of plots for each donor from a pre-specified batch variable
projection_plots <- plot_Projection_byDonor(
  query_obj = query, 
  batch_key = batch_key, 
  ref_obj = BM_ref, 
  Hierarchy_only = TRUE, # Whether to exclude T/NK/Plasma/Stromal cells 
  downsample_reference = TRUE, 
  downsample_frac = 0.25,   # down-sample reference cells to 25%; reduces figure file size
  query_point_size = 0.2,   # adjust size of query cells based on # of cells
  saveplot = TRUE, 
  save_folder = 'projectionFigures/'
)

# show plots together with patchwork. Can also just call "projection_plots" object to display one-by-one
patchwork::wrap_plots(projection_plots, ncol = 2)
```


Let's save the results of the mapping onto the complete hematopoietic hierarchy. 

```{r}
saveRDS(query, 'QueryData_Mapped_CompleteHematopoiesis.rds')
```


#### B CELL DEVELOPMENT - Download reference object and UMAP model

Set projection folder, Download reference object and UMAP model

```{r}
# Set directory to store projection reference files
projection_path = './'

# Download Bone Marrow Reference - 187 Mb
curl::curl_download('https://bdevelopmentmap.s3.us-east-2.amazonaws.com/BDevelopment_RefMap_SymphonyRef.rds',
                    destfile = paste0(projection_path, 'BDevelopment_RefMap_SymphonyRef.rds'))
# Download uwot model file - 99 Mb
curl::curl_download('https://bdevelopmentmap.s3.us-east-2.amazonaws.com/BDevelopment_RefMap_uwot_model.uwot',
                    destfile = paste0(projection_path, 'BDevelopment_RefMap_uwot_model.uwot'))

# Load Symphony reference
bdev_ref <- readRDS(paste0(projection_path, 'BDevelopment_RefMap_SymphonyRef.rds'))
bdev_ref$umap$embedding[,1] = -bdev_ref$umap$embedding[,1] 

# Set uwot path for UMAP projection
bdev_ref$save_uwot_path <- paste0(projection_path, 'BDevelopment_RefMap_uwot_model.uwot')
```


#### Visualize Bone Marrow Reference

If we want to visualize celltype labels or metadata from the BM Reference, we can create a Seurat Object from the symphony reference 
This will be memory efficient as it will not include gene expression counts, only the UMAP coordinates and the metadata including cell labels and sorting information

```{r, fig.height=5, fig.width=11}
bdev_ref_obj <- create_ReferenceObject(bdev_ref)

DimPlot(bdev_ref_obj, reduction = 'umap', group.by = 'BDevelopment_CellType', raster=FALSE, label=TRUE, label.size = 4)
```


### Mapping query data onto B cell development

```{r}
query <- readRDS('QueryData_Mapped_CompleteHematopoiesis.rds')
query
```

Filter for celltypes along B cell development

```{r}
B_development_celltypes <- c('HSC', 'HSC/MPP', 'MPP-MyLy', 'MPP-LMPP', 'LMPP', 'Early GMP', 'MLP', 'MLP-II', 'Pre-pDC', 'Pre-pDC Cycling', 'pDC', 
                    'CLP', 'EarlyProB', 'Pre-ProB', 'Pro-B VDJ', 'Pro-B Cycling', 'Large Pre-B', 'Small Pre-B', 'Immature B', 'Mature B')

query <- subset(query, predicted_CellType_BoneMarrowMap %in% B_development_celltypes)
query
```

### Map the Query Data
Provide raw counts, metadata, and donor key. This should take <1 min
Calculate mapping error and perform QC to remove low quality cells with high mapping error

```{r}
# batch variable to correct in the query data, set as NULL if no batches in query
batchvar <- 'Sample'

# Map query dataset using Symphony (Kang et al 2021)
query <- map_Query(
    exp_query = query@assays$RNA@counts, 
    metadata_query = query@meta.data,
    ref_obj = bdev_ref,
    vars = batchvar
)

# Flip UMAP1 to go from left to right
query[['umap']]@cell.embeddings[,1] = -query[['umap']]@cell.embeddings[,1]
```

```{r}
query[['umap']]
```


```{r}
bdev_ref$meta_data
```


### Cell Type Assignments
We will next use a KNN classifier to assign cell identity based on the 30 K-Nearest Neighbours from the reference map.
This label transfer step will take longer, potentially around 10 minutes for ~10,000 cells 

```{r, fig.height=5, fig.width=10}
# Predict Hematopoietic Cell Types by KNN classification
query <- predict_CellTypes(
  query_obj = query, 
  ref_obj = bdev_ref, 
  ref_label = 'BDevelopment_CellType',   ## for a more detailed annotation, use BDevelopment_CellType_Comprehensive
  initial_label = 'initial_CellType_BDevelopment', # celltype assignments before filtering on mapping QC
  final_label = 'predicted_CellType_BDevelopment'  # celltype assignments with map QC failing cells assigned as NA
) 

DimPlot(subset(query, mapping_error_QC == 'Pass'), reduction = 'umap', group.by = c('predicted_CellType_BDevelopment'), 
        raster=FALSE, label=TRUE, label.size = 4)
```


### Visualize Projection Density

Now let's visualize the density distribution of query cells across the hematopoietic hierarchy

```{r, fig.height=3, fig.width=8}
# Set batch/condition to be visualized individually
batch_key <- 'Sample'

# returns a list of plots for each donor from a pre-specified batch variable
projection_plots_bdev <- plot_Projection_byDonor(
  query_obj = query, 
  batch_key = batch_key, 
  ref_obj = bdev_ref, 
  downsample_reference = TRUE, 
  downsample_frac = 0.25,   # down-sample reference cells to 25%; reduces figure file size
  query_point_size = 0.2,   # adjust size of query cells based on # of cells
  saveplot = TRUE, 
  save_folder = 'projectionFigures/'
)

# show plots together with patchwork. Can also just call "projection_plots" object to display one-by-one
patchwork::wrap_plots(projection_plots_bdev, ncol = 2)
```


```{r}
saveRDS(query, 'QueryData_Mapped_BcellDevelopment.rds')
```






